Accelerometer (ACC) can be used to study animal behaviours. The fundamental goal of this package is to help biologists transform data recorded using accelerometers into behaviours using XGBoost as one of the most promising supervised machine learning methods for this purpose (refer to < https://arxiv.org/abs/1603.02754>). Unlike the web-based tool “AcceleRater” http://accapp.move-ecol-minerva.huji.ac.il/, the rabc package does not focus on providing a “one-stop service” turning raw accelerometer data into behaviours. Rather, the rabc package focuses on providing interactive visualization tools to its users to assist in handling and interpreting the accelerometer input data, decide on appropriate behaviour categories, and reduce accelerometer data volume efficiently and effectively (through the calculation and selection of a range of features, for definition of “feature” in machine learning, refer to https://en.wikipedia.org/wiki/Feature_engineering) without compromising behaviour classification. In brief, this package endeavours to open the lid of the machine-learning “black-box”, allowing the integration of your expert knowledge in developing advanced behaviour classification models.
To understand how ACC data could be used to derive animal behaviours, we need to know how ACCs work. An ACC is an electromechanical device used to measure acceleration forces. Such forces may be static, like the continuous force of gravity, or dynamic, sensing changes in movement direction and speed or vibrations. When attached on animals, ACCs reflect two important features of movement. First, body posture relative to gravity (i.e. position) can be derived from static components of the ACC signal. Second, after the static component is subtracted from the total ACC signal, the remainder reflects the dynamic body acceleration, representing the movements of the ACC-equipped animal. It’s worth mentioning that where the ACC is placed on an animal has great impact on what kind of behaviours can be derived from ACC data. Let’s take dairy cow for example. When the ACC is set on a leg of a cow, the data can inform on walking behaviour reasonably well, because leg movements can be well recorded as the dynamic component of ACC. However, the leg-mount ACC can hardly tell ruminating behaviour (which is obviously an essential behaviour in a ruminant, such as a cow). Rumination includes two stages, cud regurgitation and chewing in the mouth. The motions of cud regurgitation and chewing are uncoupled with leg movements and thus can hardly be recorded by a leg-mount ACC. Similarly, an ear-mount ACC can tell rumination behaviour well but is weak at discriminating walking. Therefore, it is important to consider the exact placement of the ACC device in relation to the specific behaviour(s) one wants to study.
Many studies have already conducted behaviour classification from ACC data. In most cases, ACC data with corresponding behavioural field observations are used to train behaviour classification models and report performance. In some cases, specific behaviours in such studies may yield (very) low classification accuracy. Using fewer behaviour classes and aggregating similar behaviours usually yield better classification performance. However, this grouping of behaviours is typically based on biological or ecological considerations and may not always consider the capability of ACC data to discriminate between behaviours. This package, with high dimensional data visualization at its core, endeavours to assist with such an integrative process.
The general workflow to transform ACC data into behaviours is: (1) the segmentation of raw ACC data into behaviour segments; (2) calculation of features from raw ACC data; (3) feature selection (including feature visualisation); and (4) machine-learning animal-behaviour classification model-training and validation (Fig.1). The rabc package has associated visualisation steps in this workflow to facilitate the process.
Figure 1. The general workflow to transform ACC data into behaviours with associated rabc data visualisation steps to facilitate the process.
For raw ACC data segmentation, there are two choices: even-length segmentation and variable-length segmentation. Variable-length segmentation requires an algorithm to detect behaviour change points and may thus be prone to error. Even-length segmentation does not require these additional calculations and is therefore much easier to implement. However, even-length ACC segments will inevitably contain behaviour change points (and thus multiple behaviours) affecting down the line processing and behaviour classification. An ACC segment should be sufficiently long to contain enough data to be representative of a behaviour, whereas its length should be limited to avoid inclusion of multiple behaviours as much as possible. In general, large bodied animals have longer durations of one behaviour than small animals. For example, in large herbivores, resting and ruminating behaviours usually last for tens of minutes. In contrast, a small bird might change its behaviour every couple of seconds. Therefore, the segmentation setting should follow activity characteristics of the species under study. Regarding the inevitable segments where behaviour transitions take place, I recommend retaining these segments in the model training. Although these data might decrease the accuracy of the classification model, they will make the model more robust and avoid overestimating model performance.
This rabc package only supports even-length segmentation data. The input data should be a data.frame or tibble containing raw ACC data including the behaviour associated with the ACC data. For tri-axial ACC data, each row of equal length should be arranged as “x,y,z,x,y,z,…,behaviour”, where “behaviour” is the (primary) behaviour observed during that segment For dual-axial ACC data, it should be arranged as “x,y,x,y,…,behaviour” and for single-axial ACC data as “x,x,…,behaviour”
Visualising the raw ACC data provides the user a convenient impression of how the ACC signal relates to the different behaviours and can also be used for data quality control. For raw ACC visualization, the rows are first ordered by behaviours but otherwise left in the (time) sequence of how they were arranged in the original data set using the order_acc function, which will return a data.frame or tibble with rearranged rows. For all other functions in the rabc package data are expected to be presented in this ordered format.
library(rabc)
data("whitestork_acc")
#The white stork (Ciconia ciconia) tri-axial accelerometer data (reference: <http://accapp.move-ecol-minerva.huji.ac.il/demo/>) was measured at 10.54 Hz. 40 tri-axial measurements, totalling 3.8 seconds, were used to form a segment. The dataset includes 1746 segments each forming a row in the dataset. Each row contains 121 columns. The first 120 columns are accelerometer measurements from three orthogonal axes, arranged as x,y,z,x,y,z,...,x,y,z. The final column is of type character containing the behaviour type. The data set contains 5 different behaviours including "A_FLIGHT" - active flight (77 cases), "P_FLIGHT" - passive filght (96), "WALK" - walking (437), "STND" - standing (863), "SITTING" - resting (273).
whitestork_acc_sorted <- order_acc(df_raw = whitestork_acc) #sort the data by hebaviour
For ACC data visualization, the rabc package uses function dygraph (from the dependent package dygraphs) to plot all ACC segments by behaviour. The x axis of this graph indicates the row sequence number of the sorted dataset.
plot_acc(df_raw = whitestork_acc_sorted, axis_num = 3)
Figure 2. Dynamic graph of tri-axial ACC data.
Using the range selector under the x-axis you can zoom in. To illustrate its use, zoom in on the area around segments 60 to 70, which shows that the ACC data in this range are very different from neighbouring segments. Albeit all being labelled as A_FLIGHT, these data resemble P_FLIGHT behaviour, warranting their scrutiny and, potentially, their behaviour reclassification.
This function also allows for single and dual-axis data visualization, as illustrated below where only the ACC x-axis is displayed.
plot_acc(df_raw = dplyr::bind_cols(whitestork_acc_sorted[, seq(3,121,by = 3)],
whitestork_acc_sorted[,121,drop = FALSE]), axis_num = 1)
Figure 3. Dynamic graph of single-axial ACC data.
The next step is to calculate features from the ACC data. The features will form the input to the machine-learning models. Using functions calculate_feature_time and calculate_feature_freq, two basic feature sets are calculated. The first, time-domain feature set, includes: mean, variance, standard deviation, max, min, range and ODBA, where ODBA is short for Overall Dynamic Body Acceleration. In many studies, this value has been proven to be correlated with the animal’s energy expenditure (see Wilson et al. 2006 doi:10.1111/j.1365-2656.2006.01127.x for more details). These features are calculated for each segment and each ACC axis separately (denoted with prefix x, y, z), except for ODBA, which is calculated using all available axes. For the ODBA calculation a running mean is taken from the raw data of each axis of acceleration within each segment following Shepard et al. (2008 doi:10.3354/ab00104). The length of the running mean window is defined by the winlen_dba argument.
Note that the feature_time function does not provide an exhaustive list of potential time-domain features (for more features see Brown et al. 2013 <doi:10.1186/2050-3385-1-20). Since it has been asserted that feature engineering can importantly improve the performance of machine-learning models, users may want to consider the calculation of custom features. All the functions in this package which work on feature data.frame or tibble defined by function calculate_feature_time and calculate_feature_freq are able to work also with custom features. Users need to use function cbind or bind_cols (dplyr package) for combination of feature groups.
df_time <- calculate_feature_time(df_raw = whitestork_acc_sorted, winlen_dba = 11)
label_vec <- whitestork_acc_sorted$V121
head(df_time)
## x_mean y_mean z_mean x_variance y_variance z_variance x_sd
## 1 -0.4685696 1.5659006 -12.026997 0.9977744 0.9860719 8.997329 0.9988866
## 2 3.4204731 0.2065264 -9.116610 7.3893796 4.7942707 29.787906 2.7183413
## 3 4.2400500 -0.1487214 -11.448715 3.9481838 1.1529715 10.028466 1.9870037
## 4 2.3822669 0.6914143 -9.049167 5.1439902 7.6477368 57.991724 2.2680366
## 5 1.7694251 0.4270830 -9.034117 6.5174802 3.3970576 28.635515 2.5529356
## 6 -2.5507145 -0.2252583 -9.158644 8.5318214 3.8914935 5.475554 2.9209282
## y_sd z_sd x_max y_max z_max x_min y_min
## 1 0.9930115 2.999555 2.452568 3.746303 2.997803 -4.0531915 -2.749085
## 2 2.1895823 5.457830 12.392772 6.914143 4.347613 -3.0812012 -4.691740
## 3 1.0737651 3.166775 11.599816 4.534600 2.563977 0.6343649 -2.065264
## 4 2.7654542 7.615230 7.068638 5.544784 5.774522 -2.8093305 -3.928491
## 5 1.8431109 5.351216 8.201433 4.579498 4.503681 -5.4827256 -4.063182
## 6 1.9726869 2.339990 7.514529 5.991871 -1.359353 -9.9356241 -4.009598
## z_min x_range y_range z_range ODBA
## 1 -20.66053 6.505760 6.495388 23.65834 1.410874
## 2 -20.80166 15.473973 11.605883 25.14927 6.317930
## 3 -16.67700 10.965451 6.599864 19.24097 2.500854
## 4 -19.26327 9.877968 9.473274 25.03779 10.562662
## 5 -16.61011 13.684158 8.642679 21.11379 4.900801
## 6 -16.58411 17.450154 10.001468 15.22476 3.059095
summary(df_time)
## x_mean y_mean z_mean x_variance
## Min. :-4.6093 Min. :-5.683 Min. :-12.027 Min. : 0.00483
## 1st Qu.: 0.4112 1st Qu.: 0.883 1st Qu.: -9.377 1st Qu.: 0.01692
## Median : 1.5697 Median : 3.586 Median : -8.964 Median : 0.09107
## Mean : 1.0887 Mean : 3.245 Mean : -8.723 Mean : 0.77924
## 3rd Qu.: 2.4340 3rd Qu.: 5.418 3rd Qu.: -8.127 3rd Qu.: 0.44906
## Max. : 5.9543 Max. : 9.306 Max. : -4.467 Max. :92.02646
## y_variance z_variance x_sd y_sd
## Min. : 0.00523 Min. : 0.00409 Min. :0.06947 Min. :0.07232
## 1st Qu.: 0.01796 1st Qu.: 0.01531 1st Qu.:0.13009 1st Qu.:0.13400
## Median : 0.09677 Median : 0.06617 Median :0.30178 Median :0.31107
## Mean : 1.57808 Mean : 2.74368 Mean :0.53668 Mean :0.81489
## 3rd Qu.: 2.22673 3rd Qu.: 0.77740 3rd Qu.:0.67012 3rd Qu.:1.49222
## Max. :44.76134 Max. :130.16655 Max. :9.59304 Max. :6.69039
## z_sd x_max y_max z_max
## Min. : 0.06399 Min. :-4.020 Min. :-2.825 Min. :-11.019
## 1st Qu.: 0.12375 1st Qu.: 1.314 1st Qu.: 2.678 1st Qu.: -8.636
## Median : 0.25724 Median : 2.495 Median : 5.762 Median : -7.616
## Mean : 0.77280 Mean : 2.432 Mean : 4.955 Mean : -7.096
## 3rd Qu.: 0.88170 3rd Qu.: 3.529 3rd Qu.: 7.173 3rd Qu.: -6.778
## Max. :11.40906 Max. :45.144 Max. :27.925 Max. : 17.638
## x_min y_min z_min x_range
## Min. :-27.323 Min. :-17.1703 Min. :-42.295 Min. : 0.2065
## 1st Qu.: -1.677 1st Qu.: -0.6678 1st Qu.:-11.489 1st Qu.: 0.5885
## Median : 0.573 Median : 1.4632 Median : -9.659 Median : 1.3998
## Mean : -0.217 Mean : 1.5277 Mean :-10.416 Mean : 2.6493
## 3rd Qu.: 1.679 3rd Qu.: 4.7061 3rd Qu.: -8.667 3rd Qu.: 3.3123
## Max. : 5.485 Max. : 9.0019 Max. : -4.726 Max. :69.8735
## y_range z_range ODBA
## Min. : 0.2965 Min. : 0.2398 Min. : 0.1601
## 1st Qu.: 0.5902 1st Qu.: 0.5401 1st Qu.: 0.2771
## Median : 1.3554 Median : 1.1567 Median : 0.5247
## Mean : 3.4276 Mean : 3.3193 Mean : 1.4563
## 3rd Qu.: 5.6943 3rd Qu.: 4.0660 3rd Qu.: 1.9064
## Max. :37.6559 Max. :56.9176 Max. :20.4925
Function calculate_feature_freq calculates the basic frequency domain features: main frequency, main amplitude and frequency entropy. Frequency entropy is a measure of unpredictability of the signal. For example, when an animal is engaging in a rhythmic behaviour, such a stork using active (i.e. flapping) flight, future ACC signals can be predicted from current ACC signals with considerable accuracy. Contrastingly, when an animal is engaging in behaviour with more unpredictable movements, such as foraging in the case of the stork, the entropy value will be higher (see https://en.wikipedia.org/wiki/Entropy_(information_theory) for more information). As for the time-domain features, also the frequency-domain features are calculated for each segment and each ACC axis separately (denoted with prefix x, y, z). Calculations of these features are based on fast fourier transformation of the raw ACC data. In the calculate_feature_freq function the required argument samp_freq denotes the sampling frequency of the ACC in Hz.
It is worth noting that if segment lengths are too short and/or sampling frequency too low the frequency domain features may have reduced information value, potentially rendering them of limited use for classification model building.
df_freq <- calculate_feature_freq(df_raw = whitestork_acc_sorted, samp_freq = 10.54)
head(df_freq)
## x_freqmain y_freqmain z_freqmain x_freqamp y_freqamp z_freqamp x_entropy
## 1 4.4795 3.4255 5.0065 9.853683 9.236805 27.25474 0.3947224
## 2 3.1620 3.1620 3.1620 32.068400 30.751044 91.09435 0.3926880
## 3 3.4255 1.8445 0.7905 20.645999 11.359832 36.46924 0.4071575
## 4 3.1620 3.1620 3.1620 35.736536 46.742950 146.86325 0.3705348
## 5 3.1620 5.2700 0.5270 30.034857 23.540647 70.72132 0.3961994
## 6 5.0065 0.5270 1.5810 31.048166 31.320632 24.76596 0.4028670
## y_entropy z_entropy
## 1 0.4209778 0.4176777
## 2 0.3802956 0.3570070
## 3 0.3891655 0.4156966
## 4 0.3367338 0.3208475
## 5 0.3853537 0.3797433
## 6 0.3823537 0.4080527
summary(df_freq)
## x_freqmain y_freqmain z_freqmain x_freqamp
## Min. :0.527 Min. :0.5270 Min. :0.527 Min. : 0.6521
## 1st Qu.:1.054 1st Qu.:0.7905 1st Qu.:1.317 1st Qu.: 1.5006
## Median :2.898 Median :1.5810 Median :2.635 Median : 3.3258
## Mean :2.830 Mean :2.0823 Mean :2.543 Mean : 6.5831
## 3rd Qu.:4.216 3rd Qu.:3.1620 3rd Qu.:3.689 3rd Qu.: 7.9136
## Max. :5.270 Max. :5.2700 Max. :5.270 Max. :94.2119
## y_freqamp z_freqamp x_entropy y_entropy
## Min. : 0.7369 Min. : 0.6739 Min. :0.2851 Min. :0.1926
## 1st Qu.: 1.5265 1st Qu.: 1.4138 1st Qu.:0.3846 1st Qu.:0.3648
## Median : 3.5075 Median : 2.8035 Median :0.3941 Median :0.3864
## Mean :11.9628 Mean : 11.3813 Mean :0.3919 Mean :0.3779
## 3rd Qu.:21.9006 3rd Qu.: 11.1779 3rd Qu.:0.4024 3rd Qu.:0.3991
## Max. :98.6558 Max. :232.8558 Max. :0.4241 Max. :0.4248
## z_entropy
## Min. :0.1612
## 1st Qu.:0.3773
## Median :0.3907
## Mean :0.3857
## 3rd Qu.:0.4009
## Max. :0.4318
Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction (https://en.wikipedia.org/wiki/Feature_selection). In animal behaviour studies using ACC, typically tens of features are being used in model construction. Although a relatively small number, compared to many other machine-learning models, there may still be redundancy in this feature set. Redundant features are features that show high correlation with other features and similarly contribute to model construction. Redundant features can also be “irrelevant” features that hardly contribute to model construction. There are three aims with feature selection. Firstly, less features will make the model easier to interpret. Researchers might, for instance, more easily find biomechanical connections between features and the ultimate classification model. Secondly, as in many cases the behaviour classification model is used for prediction, less features have more potential to enhance generalizations by reducing overfitting. Thirdly and finally, because of lower computational requirements in assessing behaviour from ACC data, reduced feature sets have greater potential to be used in ACC devices themselves, e.g. on-board of light-weight tracking devices.
In the rabc package, a combination of a filter and a wrapper feature selection method is used by function select_features. The filter part has the purpose of removing redundant features. The filtering method uses the absolute values of pair-wise correlation coefficients between features. If two variables have a high correlation, the function looks at the mean absolute correlation of each variable and removes the variable with the largest mean absolute correlation. The threshold correlation coefficient (cutoff) is user-defined with a default “cutoff = 0.9”. In the default constellation the filter function is turned off (i.e. “filter = FALSE”).
The purpose of the wrapper is to select most relevant features. The wrapper part uses stepwise forward selection (SFS) using extreme gradient boosting (XGBoost), which is not only used for feature selection but also for the final classification model (see below). XGBoost is a scalable tree boosting method that proved to be better than other tree boosting methods and random forest, showing very good performance in Kaggle machine learning competitions. We also experienced ourselves that XGBoost is fast to train and has good performance with limited numbers of trees, which typically applies to animal behaviour classification tasks. The number of features to select (no_features) can be set by the user, having a default of “no_features = 5”. This parameter also determines how many rounds of SFS are being conducted. In the first round, each feature is individually used to train a classification model by XGBoost. The feature with highest overall accuracy will be kept into the selected feature set. Then, in every following round, each remaining feature will be combined with the selected feature set to train a classification model and the one with the highest accuracy will be kept into the selected feature set. The process will stop when the number of rounds equals the no_features setting.
The select_features function will return a list of which the first member (i.e., [[1]]) contains a matrix providing the classification accuracy for each of the features (columns) across all steps (rows, top row being first step) of the SFS process. Once a feature is selected into the selected feature set, the remaining values in this feature’s column are set to zero. The second member of the list (i.e., [[2]]) contains the names of the selected features in the order in which they were selected in the SFS process. The development of the classification accuracy with each step in the SFS process is plotted with function plot_selection_accuracy.
selection <- select_features(df_feature = dplyr::bind_cols(df_time, df_freq),
vec_label = label_vec, filter = FALSE,
no_features = 10)
selection[[1]]
## z_range y_max y_variance z_min y_mean z_mean y_freqmain
## round1 0.7325069 0.6329420 0.7010841 0.7102006 0.6449614 0.5452982 0.5755974
## round2 0.0000000 0.8808807 0.7887082 0.8012403 0.8688119 0.8138320 0.7267693
## round3 0.0000000 0.0000000 0.9061064 0.9009009 0.8992061 0.9015249 0.8923588
## round4 0.0000000 0.0000000 0.0000000 0.9146909 0.9129832 0.9094956 0.9072690
## round5 0.0000000 0.0000000 0.0000000 0.0000000 0.9221362 0.9169570 0.9175941
## round6 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.9243956 0.9227273
## round7 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.9255615
## round8 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## round9 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## round10 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## y_range y_sd x_freqmain x_mean x_variance z_variance
## round1 0.6947063 0.7010841 0.4862692 0.5022887 0.6563871 0.7152884
## round2 0.7766130 0.7887082 0.7319422 0.7720577 0.7313807 0.7268121
## round3 0.9043592 0.9061064 0.8866083 0.8900501 0.8854589 0.8940601
## round4 0.9078140 0.9061064 0.9083954 0.9135514 0.9118272 0.9124184
## round5 0.9221165 0.9146909 0.9158371 0.9192822 0.9169930 0.9181395
## round6 0.9238439 0.9221362 0.9227208 0.9232888 0.9192658 0.9187043
## round7 0.9232561 0.9243956 0.9232560 0.9249834 0.9232593 0.9221199
## round8 0.9272792 0.9255615 0.9238471 0.9232724 0.9244218 0.9238407
## round9 0.0000000 0.9272792 0.9267076 0.9215647 0.9244055 0.9249770
## round10 0.0000000 0.0000000 0.9267076 0.9215647 0.9244055 0.9249770
## x_sd z_sd x_max z_max x_min y_min x_range
## round1 0.6563871 0.7152884 0.5761983 0.5727931 0.5922157 0.6850417 0.6283173
## round2 0.7313807 0.7268121 0.7789117 0.7978185 0.7754306 0.8557018 0.7330752
## round3 0.8854589 0.8940601 0.8872125 0.8871470 0.8797085 0.9026644 0.8900369
## round4 0.9118272 0.9124184 0.9072360 0.9123528 0.9049535 0.9129733 0.9072327
## round5 0.9169930 0.9181395 0.9146943 0.9158241 0.9146975 0.9209998 0.9158240
## round6 0.9192658 0.9187043 0.9215745 0.9192429 0.9221363 0.9209801 0.9175515
## round7 0.9232593 0.9221199 0.9232724 0.9209636 0.9215385 0.9209539 0.9204087
## round8 0.9244218 0.9238407 0.9255746 0.9226880 0.9221165 0.9215319 0.9209867
## round9 0.9238341 0.9249770 0.9255582 0.9232496 0.9198340 0.9249507 0.9244154
## round10 0.9238341 0.9249770 0.9255582 0.9232496 0.9198340 0.9249507 0.9244154
## ODBA z_freqmain x_freqamp y_freqamp z_freqamp x_entropy y_entropy
## round1 0.7170581 0.5074905 0.6409252 0.6947396 0.7038529 0.4770408 0.5910360
## round2 0.7571673 0.7571381 0.7371405 0.7749053 0.7262047 0.7130514 0.7244477
## round3 0.9038304 0.8917645 0.8860337 0.9032426 0.8866249 0.8854720 0.8871929
## round4 0.9106779 0.9089832 0.9106943 0.9083824 0.9095350 0.9095251 0.9089636
## round5 0.9146812 0.9181262 0.9124117 0.9141295 0.9135482 0.9152394 0.9209867
## round6 0.9198340 0.9204284 0.9204349 0.9187109 0.9221231 0.9221230 0.9175351
## round7 0.9209573 0.9221165 0.9221229 0.9215352 0.9180804 0.9226845 0.9215418
## round8 0.9198340 0.9215450 0.9204119 0.9244088 0.9203858 0.9215417 0.9232725
## round9 0.9221133 0.9232626 0.9215419 0.9221034 0.9209606 0.9215220 0.9249770
## round10 0.9221133 0.9232626 0.9215419 0.9221034 0.9209606 0.9215220 0.9249770
## z_entropy
## round1 0.4982922
## round2 0.7371177
## round3 0.8900403
## round4 0.9043887
## round5 0.9141063
## round6 0.9192164
## round7 0.9203923
## round8 0.9181131
## round9 0.9209473
## round10 0.9209473
selection[[2]]
## [1] "z_range" "y_max" "y_variance" "z_min" "y_mean"
## [6] "z_mean" "y_freqmain" "y_range" "y_sd" "x_freqmain"
plot_selection_accuracy(results = selection)
Figure 4. The accuracy plot of selected features. Red line and dots in the plot denotes classification accuracies of accumulated selected features. Grey bars indicate accuracy increase (labeled with numbers) of each selected feature.
We can see that after the sixth selected feature, “z_mean”, there is almost no further improvement in classification accuracy with the addition of more features. Users can define the cut-off point when there is no improvement of accuracy or the improvement of accuracy is less than a certain threshold.
Above, under “features selection” we already mentioned the three objectives with feature selection: improving interpretability, reducing overfitting, reducing computational requirements. Visualisation of the features can further assist in deciding on the features to use in the ultimate behaviour classification model, but its main use is in deciding if any behaviour types should be combined to ultimately improve behaviour classification performance. Alternatively, the visualisation may also lead to considering splitting up existing behaviour types into multiple behaviours. In other words this visualisation aids in evaluating the behaviour set.
The rabc package offers three ways to visualize the time and frequency domain features. The first two visualise the features in isolation whereas the third is an integrative approach where entire feature domains are analysed collectively. The first of the visualisation methods, plot_feature, draws individual values of features in sequence and the second, plot_feature_grouped, depicts the distribution of all the features by behaviour type. The third, integrative approach uses Uniform Manifold Approximation and Projection (UMAP), which is a nonlinear dimensionality-reduction technique, suitable for high dimensional data visualization.
We will start off with the two methods where features are inspected in isolation. While these visualisations will be exemplified for the time domain features only, these methods can also be used for the frequency domain features.
Similar to ACC data visualization, function plot_feature allows users to generate a visual impression of the feature values by behaviour types, where the x axis of the graph indicates the row sequence number of the sorted dataset.
Before calling function plot_feature, all behaviours in the sorted ACC dataset need to be drawn together.
label_vec <- whitestork_acc_sorted[, ncol(whitestork_acc_sorted)]
#assign behaviour labels to a vector
plot_feature(df_feature = df_time, vec_label = label_vec)
Figure 5. All time domain features sequence plot.
The plot containing all time domain features simultaneously can be overwhelming. However, one can also plot a subset of these features. For example:
plot_feature(df_feature = df_time[, grep("mean", names(df_time))], vec_label = label_vec)
Figure 6. Mean values of three axes sequence plot.
The function plot_feature_grouped generates feature distributions grouped by behaviour types. The distributions can be plotted as density plots and as boxplots (also violin plots). When one or more features have very dissimilar distributions between behaviours, these features have great potential to be of value in behaviour classification modelling. Conversely, when features have very similar distributions across behaviours, they can hardly contribute to a behaviour classification model.
plot_feature_grouped(df_feature = df_time, vec_label = label_vec, geom = "density")
Figure 7. Density plots of all time domain features.
Figure 7. Density plots of all time domain features.
Figure 7. Density plots of all time domain features.
plot_feature_grouped(df_feature = df_time, vec_label = label_vec, geom = "boxplot")
Figure 8. Boxplots of all time domain features.
Figure 8. Boxplots of all time domain features.
Figure 8. Boxplots of all time domain features.
In many of the density plots we observe clearly different distributions for the various behaviours, (e.g. in y_mean, y_max, z_max,y_min, Z_min, ODBA) which make these features potentially good candidates for behaviour classification modelling. In x_mean we oddly observe two peaks in each of the five behaviours. In the white stork dataset, which was collected using backpack ACCs, x represents the horizontal axis perpendicular to the direction of the spine (i.e. the “sway axis”). The remarkable two-peak distributions in x_mean may therefore result from a-central positioning of the ACCs, some backpacks being positioned slightly to the left of the spine and others slightly to the right. In this case, we should therefore be cautious with including features from the x axis in a behaviour classification model.
The ODBA boxplots suggest clear differentiation of behaviours by ODBA with active flight > walking > passive flight > standing > sitting, clearly representing the differences in relative movement intensity of the five behaviours.
This is a very important part of this package. So I’ll use several sections to introduce and illustrate, including ethogram, animal activities recorded by accelerometer, UMAP for high dimensional data visualization and how to combine ethogram and accelerometer data through UMAP.
Uniform manifold approximation and projection (UMAP) is a very powerful nonlinear dimensionality-reduction technique, which is also highly suitable for high-dimensional data visualization. On the downside, UMAP lacks the strong interpretability of for instance Principal Component Analysis (PCA) as, contrary to PCA, the dimensions of the UMAP embedding space have no specific meaning (see https://arxiv.org/pdf/1802.03426.pdf). UMAP has found its niches in bioinformatics, material sciences and machine learning. With the broad field of biology it has been used in bioacoustics studies, but it has rarely been used in animal behaviour studies. As mentioned earlier, the usual pathway used to transform ACC data into behaviours is: segmentation of raw ACC data – calculation of segmented ACC data into features - feature selection (although not strictly required) - machine learning animal behaviour classification, model training and validation. UMAP can be a very powerful tool in transforming a large number of features (usually dozens) into a two-or more dimensional plot; UMAP can collapse the data into any number of dimensions, but here we will use two-dimensions only for purposes of visualization). The data points in the two-dimensional space are plotted with different colours, each colour reflecting a different behaviour. The optimal scenario to which one strives is that each behaviour forms an isolated cluster within this two-dimensional space. In this way, UMAP already provides an indication of how the final classification model will perform; isolated behaviour clusters indicating high classification accuracy. There where overlaps in clusters exists, researchers may wish to consider grouping certain behaviours because they may not be adequately separated using ACC data. Conversely, there where behaviours are spread out over a plot, having those behaviours reclassified in multiple behaviour types, may be a possibility.
Before calling UMAP, one should make sure that variables “df_time” (time domain feature set, a data.frame or tibble), “df_freq” (frequency domain feature set, a data.frame or tibble) and “label_vec” (corresponding behaviours, a character vector) exist.
UMAP is called using function:
#not run
#plot_UMAP()
For this tutorial, there is an example online shiny app of plot_umap - https://huiyu-deakin.shinyapps.io/rabc_UMAP/.
Within the UMAP visualization Shiny App, one can select three panels representing three different functions: UMAP calculation and tuning, Feature visualization through UMAP and Selected features which we will each discuss in the below:
Features to input radioButtons allow users to choose which feature groups (i.e. time domain, frequency domain, or both) to use as input to UMAP. In this package, there are a total of 28 features for the time (19) and frequency (9) domains. Considering UMAP is not constrained by the number of features (UMAP can handle thousands of dimensions at a time) and more features may better represent the data structure, the default is set to include both time and frequency domain features. Yet, users are encouraged to also try the time or frequency domain features in isolation to see how well they are each able to inform on the behaviour in themselves represent preserve structure.
Hyperparameters are the parameters that need to be set by the user to optimally assist the machine learning process and the tuning of hyper-parameters can be a subtle and tedious process of trial and error that the UMAP visualization Shiny App endeavours to expedite (for further details on explanations of hyperparameters see http://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)). The Hyperparameter tuning part gives users a tool to tune key parameters in the UMAP function. Number of neighbours (i.e. n_neighbors in r function umap) controls how UMAP balances the local versus the global structure in the data, with low values forcing UMAP to focus on the local structure. The default setting of Number of neighbours is 15. The Minimum distance (min_dist) controls how tightly UMAP is allowed to pack points together. A small Minimum distance will result in more clumped data points. The default setting is 0.1. Finally, the Distance metric (metric) determines how distance between data points is computed. The drop-down options include “euclidean”, “manhattan”, “cosine” and “pearson”. The default is “euclidean”. The default values of above mentioned three hyperparameters are inherited from the invoked umap function. Users can adjust these three hyperparameter-tuning settings to obtain a satisfying two-dimensional embedding. By tuning these hyperparameters, users might find better clustering among behaviours, which will influence the judgements to adjust or not adjust the original behaviour labels. Each time one parameter is adjusted, the plot will automatically update using the new parameter settings (time use for updating depending on number of segments in the dataset).
Using the default settings on the white stork data, we can see that the different behaviours separate generally well. Only standing (labelled as STND) and sitting (SITTING) show some overlap. These two behaviours are both static. Therefore, only features representative of static postures can contribute to differentiate between the two. Among these “static posture” features, y_mean and z_mean are probably the most crucial candidates to distinguish between the two behaviours, based on distribution patterns in the previous features density plot. We have seen above in Fig.7 these two features still exhibit some overlap in their distribution, which inevitably leads to overlap in UMAP.
Departing from the plot created in UMAP calculation and tuning, the drop-box allows selecting for which feature the values will be depicted on the plot. In the resulting plot, the different behaviours are distinguished by different symbol shapes, the colour of the symbols depicting the feature values This tool thus allows to evaluate the contribution of individual features in distinguishing between behaviours. For instance, choosing “ODBA” from the drop-box, we see that active flight (A_FLIGHT) has markedly higher ODBA values than any other behaviour, followed by walking (WALK). Therefore, ODBA may be a useful feature to include in the ultimate classification model. Conversely, when selecting “x_freqmain” we see an almost random distribution of values across the different behaviours. This suggests that this feature does not provide valuable information for behaviour classification.
In this panel, the user can choose which features to input into UMAP by ticking the checkboxes. This function can be used in combination with the results from the feature selection function, detailed above. Indeed, it is advised to start with selecting those features that have been identified by the filter_wrapper() feature-selection function. For the case of the stork data, when choosing only the five features y_mean, z_mean, y_sd, z_sd and ODBA and pressing submit, one can see that this very limited set of features is able to quite successfully differentiate between behaviours.
After feature selection and visualisation (including potential reclassification of behaviour types), the user can train a supervised machine learning model (XGBoost in this package) with the selected, most relevant features through function train_model. Usually, the construction and evaluation of supervised machine learning models includes three steps: (i) machine learning model hyperparameter tuning by cross-validation, (ii) model training with the optimal hyperparameter set, and (iii) evaluating model performance through validation with a test dataset (please note that the hyperparameters to be tuned in step (i) have no relationship with the hyperparameters in the UMAP model mentioned above under feature visualisation). Function train_model is a wrapper function that utilizes relevant functions from the “caret” package to automatically conduct the three above steps for model construction and evaluation.
Four arguments can be set in train_model. Which features to use for model building is set by “df”, which in the following example is set to “selection$features[1:6]” (i.e. the first six selected features from the feature selection procedure). The “vec_label” argument is used to pass on a vector of behaviour types. How to select the hyperparameter set is set by “hyper_choice”, which has two options: “defaults” will let XGBoost use its default hyperparameters while “tune” will run repeated cross-validations to find a best set (note that the settings for the hyperparameters inside this function are based on our previous experience with a range of different ACC datasets (Hui Yu et al, 2020, under review) and are set at: nrounds = 10, max_depth = c(2, 3, 4, 5, 6), eta = c(0.01, 0.1, 0.2, 0.3), gamma = c(0, 0.1, 0.5), colsample_bytree = 1, min_child_weight = 1, subsample = 1). Finally, “train_ratio” determines the percentage of data used to train the model, the remainder of the data being used for model validation.
## Confusion Matrix and Statistics
##
## Reference
## Prediction A_FLIGHT P_FLIGHT SITTING STND WALK
## A_FLIGHT 14 0 0 0 0
## P_FLIGHT 0 22 2 0 1
## SITTING 0 0 62 3 1
## STND 1 2 3 203 4
## WALK 4 0 1 9 103
##
## Overall Statistics
##
## Accuracy : 0.9287
## 95% CI : (0.9004, 0.9511)
## No Information Rate : 0.4943
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8924
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A_FLIGHT Class: P_FLIGHT Class: SITTING Class: STND
## Sensitivity 0.73684 0.91667 0.9118 0.9442
## Specificity 1.00000 0.99270 0.9891 0.9545
## Pos Pred Value 1.00000 0.88000 0.9394 0.9531
## Neg Pred Value 0.98812 0.99512 0.9837 0.9459
## Prevalence 0.04368 0.05517 0.1563 0.4943
## Detection Rate 0.03218 0.05057 0.1425 0.4667
## Detection Prevalence 0.03218 0.05747 0.1517 0.4897
## Balanced Accuracy 0.86842 0.95468 0.9504 0.9494
## Class: WALK
## Sensitivity 0.9450
## Specificity 0.9571
## Pos Pred Value 0.8803
## Neg Pred Value 0.9811
## Prevalence 0.2506
## Detection Rate 0.2368
## Detection Prevalence 0.2690
## Balanced Accuracy 0.9510
## xgbTree variable importance
##
## Overall
## y_variance 100.00
## y_max 91.77
## z_range 58.78
## z_min 54.92
## z_mean 12.63
## y_mean 0.00
During the training process, while the function is running, it will provide updates on the contribution of the selected features to the console. The ultimate output consists out of four parts. The first is a confusion matrix, depicting how well the ultimate behaviour classification model predicts the different behaviours based on the validation part of the data set only (i.e 25% of the data set in our stork example using a train_ratio of 0.75). On the diagonal of this table, where the observed behaviour is organised in columns and the predicted behaviour is organised in rows, the correct predictions are depicted, with all the wrong predictions being off the diagonal. The overall performance statistics are presented next, the meaning of which is explained in detail in https://topepo.github.io/caret/measuring-performance.html. The third part of the output, statistics by class, presents a range of performance statistics for the individual behavioural categories, which are explained in detail in https://topepo.github.io/caret/measuring-performance.html. Finally the importance of the various features in producing the behaviour classification model is being presented. As can be seen in this Stork example the y_mean feature does not contributes to the ultimate classification model, suggestion that this feature could potentially be removed from the feature set.
Another way of calculating and visualising the performance of the behavioural classification model makes use of cross-validation using function plot_confusion_matrix. In this case the entire data set is randomly partitioned into five parts. In five consecutive steps, each of the five parts is used as a validation set, while the remaining four parts are used for model training. This procedure thus resembles a five-fold “classification model training and validation” with a train_ratio of 0.8, be it that in this case the data set is systematically divided and each point in the data set is being used for the validation process at some point (see function createFolds in “caret” for more details). Thus, after all five training and validation rounds, all behavioural observations will also have an associated predicted behaviour, which are being stored in the data frame that is being returned by plot_confusion_matrix in addition to a plot of the confusion table.
pred <- plot_confusion_matrix(df = df_time, vec_label = label_vec)
Figure 9. Confusion matrix plot of 5-fold cross-validation results. The dots in the graph are coloured according to the classification results, with blue and red symbols being correct and incorrect classifications, respectively. Number of each observation and prediction pair are labelled.
head(pred)#the first six lines of the returned data frame
## obs pre id
## 1 A_FLIGHT P_FLIGHT 1
## 2 A_FLIGHT A_FLIGHT 2
## 3 A_FLIGHT A_FLIGHT 3
## 4 A_FLIGHT A_FLIGHT 4
## 5 A_FLIGHT A_FLIGHT 5
## 6 A_FLIGHT WALK 6
Using the predictions from the behaviour classification model, we can now return to the original ACC data to evaluate which ACC signals lead to correct and incorrect classifications using function plot_confusion_matrix. This function basically uses the same digraph with near identical look to function plot_acc used earlier. The only deviation is that all correct and incorrect predictions (identified using the data frame from function plot_confusion_matrix) are now identified with a solid and a dotted line, respectively, and annotated with labels for the observed and predicted behaviours at the bottom and top of the graph respectively.
plot_wrong_classifications(whitestork_acc_sorted, df_result = pred)
Figure 10. Wrong predictions plotted on dynamic graph of ACC data.
To illustrate this function, zoom in to segments around 1190 to 1210. This region was classified as standing behaviour, the dotted lines identifying misclassifications as walking and active flying behaviour. Compared with other standing segments, these misclassifications have more dynamic movements suggesting walking and wing flapping, of which the walking may actually be a correct classification not picked up in the behavioural scoring.